Functions for feature engineering

  1. missing_zero_values_table : remove the null value columns if % is more
  2. get_zero_cols: remove the columns with stats value is 0 for more than 4 stats
  3. Multicolinearity class to remove multicolinear featurs

Function to get the final df after feature engineering

  1. separate cat df and num df
  2. Generate stats to remove outlier for num df
  3. Box plot to remove outlier for cat df and volini plot to remove outlier for num df
  4. Remove multi colinear features and remove null columns

Data Preparation

  1. Get the data from data frame
  2. Join both the data frame to get the final data
  3. Impute the columns with mean
  4. Apply get_final_df functions to do feature engineering and return df_input

Exploratory Data Analysis

  1. Apply Autoviz package to do EDA and see if any feature transformations required
  2. As only feature scaling and one hot encoding is required from the below EDA let's do that before model training

Hyperparameter Tunning to get best hyperparameters

Apply Model

  1. GBDT is best model for non linear data and high bias dataset
  2. Applied GBDT with best hyperparameters to get the o/p
  3. We can fine tune the model by using multiple hyperparameter technique

Prediction test datasets and extract the output